Universal Access To All Knowledge
Home Donate | Store | Blog | FAQ | Jobs | Volunteer Positions | Contact | Bios | Forums | Projects | Terms, Privacy, & Copyright
Search: Advanced Search
Anonymous User (login or join us)
Upload

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: donuil Date: Nov 1, 2008 11:21am
Forum: texts Subject: Re: Book digitized by Google and uploaded to the Internet Archive by user tpb

Yes, I was wondering about this too. Well, one can hardly fail to notice the size of the IA Texts Archive doubling rather quickly to over 1,000,000 items... and when one types in sponsor:"google", half of the Archive appears in the results.
I hope it is all... well... okay. But it seems to be at least semi-official, as there now is a collection homepage for Google Books: http://www.archive.org/details/googlebooks
I mean, Google would hardly miss someone pulling 500,000 of their public domain books. And they must have had vast processing power and storage space to convert them all to .djvu as well. Perhaps Google agreed to it?
I hadn't noticed that about the watermarks though.

Reply to this post
Reply [edit]

Poster: stbalbach Date: Nov 2, 2008 3:12pm
Forum: texts Subject: Re: Book digitized by Google and uploaded to the Internet Archive by user tpb

Hi thanks for the info, I had not noticed the Google sponsor category, which I guess makes it official. One thing I don't understand is why the books have such high download counts, as if they have been on Internet Archive for a long already. For example:

http://www.archive.org/search.php?query=%28sponsor%3Agoogle%20AND%20mediatype%3Atexts%29

Books generally don't have such high download counts, in particular foreign language works, unless they've been around a while. Is it possible these books have been on IA for a while and attribution metadata was recently added? If so, how does it explain the sudden jump in total overall works to over 1 million - were the books not counted as part of the total before?

This post was modified by stbalbach on 2008-11-02 23:12:32

Reply to this post
Reply [edit]

Poster: stbalbach Date: Nov 2, 2008 3:26pm
Forum: texts Subject: Re: Book digitized by Google and uploaded to the Internet Archive by user tpb

Well, partly to answer my own question, looking at the full metadata record (under the "HTML files" link) it appears the "publicdate" field is recent, in the past few months, so they are recent additions. It's odd the download count is so high, this is representative of most them:

http://www.archive.org/details/aabnegao00amorgoog

Downloaded 45 times in the past few months. Strangely popular for an obscure Spanish language 19th century poet. Perhaps there is something going on internally either with the hit stats or mirroring by other libraries.

This post was modified by stbalbach on 2008-11-02 23:26:49

Reply to this post
Reply [edit]

Poster: donuil Date: Nov 3, 2008 12:19am
Forum: texts Subject: Re: Book digitized by Google and uploaded to the Internet Archive by user tpb

Oh, look, I came across another collection page, 'European Libraries'. http://www.archive.org/details/europeanlibraries
Seems to be all the Bodleian/Oxford University books from Google.

Re: the download numbers. I have never trusted the download numbers (that is, well before Google was added to IA); some seem completely absurd, and I had always just assumed it was the result of some sort of automation like crawling, mirroring, or whatever.

Thank you, by the way, for pointing out the metadata within the HTTP folder. I am in and out of those folders all the time and never noticed the full metadata was there!

Reply to this post
Reply [edit]

Poster: stbalbach Date: Nov 3, 2008 6:02am
Forum: texts Subject: Re: Book digitized by Google and uploaded to the Internet Archive by user tpb

Excellent, thanks. Hopefully they will list these libraries/collections on the main Text page with the others.

Re: download counts, yeah who knows, guess we'll need an insiders view to explain it.

Your welcome on the metadata, I just found it on a whim thinking it must be around somewhere since the main work page says "select" metadata, the rest had to be somewhere. I kinda wish they'd open it up to the community ala Wikipedia or LibraryThing, to correct errors and add info.

Reply to this post
Reply [edit]

Poster: donuil Date: Nov 3, 2008 12:28am
Forum: texts Subject: Re: Book digitized by Google and uploaded to the Internet Archive by user tpb

Addition: The European Libraries also contain books from Lausanne and Ghent Universities, although Oxford makes up the vast majority.